14 research outputs found

    Speaker-Independent Mel-cepstrum Estimation from Articulator Movements Using D-vector Input

    Get PDF

    New Grapheme Generation Rules for Two-Stage Modelbased Grapheme-to-Phoneme Conversion

    Get PDF
    The precise conversion of arbitrary text into its  corresponding phoneme sequence (grapheme-to-phoneme or G2P conversion) is implemented in speech synthesis and recognition, pronunciation learning software, spoken term detection and spoken document retrieval systems. Because the quality of this module plays an important role in the performance of such systems and many problems regarding G2P conversion have been reported, we propose a novel two-stage model-based approach, which is implemented using an existing weighted finite-state transducer-based G2P conversion framework, to improve the performance of the G2P conversion model. The first-stage model is built for automatic conversion of words  to phonemes, while  the second-stage  model utilizes the input graphemes and output phonemes obtained from the first stage to determine the best final output phoneme sequence. Additionally, we designed new grapheme generation rules, which enable extra detail for the vowel and consonant graphemes appearing within a word. When compared with previous approaches, the evaluation results indicate that our approach using rules focusing on the vowel graphemes slightly improved the accuracy of the out-of-vocabulary dataset and consistently increased the accuracy of the in-vocabulary dataset

    転置畳み込みニューラルネットワークを用いたrtMRIデータからの調音-音響変換

    Get PDF
    Tokyo University of ScienceTokyo University of ScienceTokyo University of ScienceTokyo University of Science会議名: 言語資源活用ワークショップ2021, 開催地: オンライン, 会期: 2021年9月13日-14日, 主催: 国立国語研究所 コーパス開発センター本稿では,rtMRIデータから音響特徴量を生成するための深層学習モデルを提案する。調音器官全体を高解像度で記録できるrtMRIは,調音データから音響特徴量を生成するための元データとして有用であると考えられるが,フレームレートが比較的低いという問題がある。そこで我々は,転置畳み込みネットワークを用いて時間軸方向に超解像処理を行う方法を提案する。標準的な畳み込みニューラルネットワークが畳み込みによって主に画像の近隣情報を圧縮するのに対して,転置畳み込みネットワークではこの逆の操作を行うことにより,画像の解像度を向上させる。本手法ではこの超解像処理をrtMRIデータの時間方向に適用することによって,rtMRIデータの時間解像度を向上させる。メルケプストラム歪みとPESQを評価尺度として用いた実験の結果,転置畳み込みネットワークは正確な音響特徴量の生成に有効であることがわかった。また,超解像処理の倍率を上げることで,PESQのスコアが向上することも確認した

    A Model of Belief Formation Based on Causality and Application to N-armed Bandit Problem

    No full text

    Development of a Toolkit for Spoken Dialog Systems with an Anthropomorphic Agent: Galatea

    Get PDF
    The Interactive Speech Technology Consortium (ISTC) has been developing a toolkit called Galatea that comprises four fundamental modules for speech recognition, speech synthesis, face synthesis, and dialog control, that can be used to realize an interface for spoken dialog systems with an anthropomorphic agent. This paper describes the development of the Galatea toolkit and the functions of each module; in addition, it discusses the standardization of the description of multi-modal interactions.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Oral session: Infrastructure Software for Speech Processing (5 October 2009)
    corecore